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Abstract 

The validity of statistical analyses applied to identify 
different factors in many fields depends upon the use of 
appropriate sample sizes, the lack of which reduces the 
power of the findings. However, the number of cases 
collected for data analysis in medical studies is generally 
limited, for medical, financial and other experimental 
reasons, and statistical tests are often carried out without 
power and sample size estimation. Power analysis involves 
several parameters, the most important of which, the effect 
size, reflects the degree of the effect expected to be found in 
the study. An easy-to-use MS Excel calculator has been 
constructed to determine the effect size for chi-square tests 
based on 2x2, 2x3 and 2x4 contingency tables, and compared 
the results obtained with this calculator with those given by 
GPower, R and, for a 2x2 table, SAS software to demonstrate 
the practical use of this calculation tool in three studies 
involving various data. 
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Introduction 

The independence of two quantitative grouping 
variables can be investigated by creating a contingency 
table [Pearson, 1904]. p-values for the comparison of 
these categorical variables can be calculated using chi- 
square tests, the statistical method most frequently 
used to detect differences between proportions 
[Agresti, 2002], supported by most statistical software. 

Power calculation has recently received increasing 
attention in statistical analysis as an essential tool to 
determine an appropriate sample size, or to obtain a 



power which indicates the reliability of the statistical 
test based on preliminary data [Cohen, 1988; Gordon 
et al., 2002]. Power is considered highly important in 
experimental design, where it is applied before data 
collection, but it is itself also based on preliminary 
information (which can be obtained from a pilot study) 
[Osmena, 2010]. 

Most statistical tests have their specific calculation 
methods for power estimation; in this paper, we focus 
on power calculations for the chi-square test, which 
are well known [Cohen, 1988]. To determine the 
power of a statistical test, a few input parameters have 
to be set, the most crucial of them being the effect size. 
The determination of effect size is a medical problem: 
which is generally given by a physician or researcher 
on the basis of earlier experience. When the examined 
field is a new one and there are no previous results to 
find an appropriate effect size, it can be determined 
from new data, but only with the use of a preliminary 
set. 

The power of a statistical test is the probability of 
correctly finding a difference (rejecting the null 
hypothesis) between the investigated variables with 
the statistical test. 

There are several arguments that a post-hoc power 
calculation is meaningless [Hoenig et al., 2001; Lenth, 
2001]. During research in which power is calculated, it 
is mostly done after the experiment has been finished, 
i.e. retrospectively, which is as much a mistake as it is 
useful in the design phase. Hoenig et al. showed why 
there is no reason to run a power analysis when a 
result was significant [Hoenig et al., 2001]. Researchers 



31 



www. srl-j ournal . org 

have to handle this topic with foresight. 

Power analysis is available in several free software 
packages (e.g. R or GPower). A comparison of power 
analysis for the chi-square test in the R and SAS 
systems revealed both advantages and disadvantages 
of the possibilities in the methods applied by the two 
programs [Osmena, 2010]. However, these packages 
utilize effect size as an input parameter. If it is not 
given by an expert, but a preliminary data set is 
available, calculation of the effect size is possible. 

To simplify effect size calculation from a contingency 
table, an MS Excel calculator has been constructed 
(Appendix II), and compared with calculations in R (for 
a 2x2 contingency table, an R script is given in 
Appendix 111), in GPower software, and for 2x2 
contingency tables in SAS. 

Other effect size calculators for the chi-square test are 
freely available via the internet (including MS Excel- 
based ones) [DeFife, 2009; Ellis, 2009; Wilson, 2001], 
but these mostly work with the formula based on the 
chi-square test statistic and the total number of cases, 
rather than frequency data in a contingency table, 
which we applied [Cohen, 1988]. These calculators are 
mainly created for 2x2 contingency tables, but we have 
extended the dimensions to 2x3 and 2x4 tables. 

Aim 

Our primary aim was to evaluate the effect size for 
power estimation for chi-square tests as a 
demonstration of the analysis in different studies 
using an MS Excel-based calculator based on pilot 
studies. A further goal was to run power calculations 
on additional experimental data, using the calculated 
effect size. The easy-to-use MS Excel calculator was 
compared with other software (R, GPower and SAS; 
see the section Programs Used in the Methods) in order 
to prove our results. 

Methods 

Definitions 

The null hypothesis of the chi-square test for 
independence can be tested by the following formula: 

i=l 1 

where: 

Oi = observed frequency 

Ei = expected frequency, asserted by the null 
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hypothesis 

m = number of cells [Osmena, 2010]. 

"The distribution of the x 2 -statistic follows a central 
chi-square distribution when the null hypothesis is 
true. When the null hypothesis is false it follows a 
noncentral chi-square distribution with the 
noncentrality parameter, A", where 

A=(effect size) 2 (total sample size) [Osmena, 2010]. 

Therefore, the power can be defined as the 
probability P (j 2 (df, X) > Xi-aW)) [Osmena, 2010]. 

As the chi-square test is used to check the 
independence between investigated categorical 
variables, two-sided tests are carried out during 
analyses. 

The function pwr.chisq.test() in the pwr library of R 
uses the following parameters to perform power 
analysis: 

Effect size 

Total number of observations 

- Degrees of freedom 

- Level of significance 
Power 

Effect size (w): This is the size of the effect that is 
expected to be found in the study [Osmena, 2010]. 
From another aspect it is a pure number which 
increases with the degree of discrepancy between the 
distribution of the alternate hypothesis and the null 
hypothesis (viz. the difference from the null 
hypothesis that we expected to detect) [Cohen, 1988], 
or in other words, the difference between the 
investigated populations which considered from the 
background of the specific (i.e. clinical) field. From the 
genetics point of view, the if is a measure of the 
separation of individual phenotypes based on their 
genotypes [Gordon et al., 2005]. 

In the R program, the ES.wl() and ES.w2() functions of 
the pwr package can be used to calculate w [Osmena, 
2010]. 

Total number of observations (N): This is the number 
of all nonmissing cases in a study that compares two 
grouping variables (and thus the overall number of 
cases in the investigated contingency table). 

Degrees of freedom (df): This is the number of rows in 
the contingency table minus 1 multiplied by the 
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number of columns in the contingency table minus 1 
(viz. the number of cells that need to be known in the 
contingency table in order to calculate the others the 
row and column totals given). 

Level of significance (a): This is also called the Type I 
error, which reflects the probability of a significant 
statistical test when there is no real difference between 
the compared populations (viz. the probability that the 
null hypothesis is true, but it is rejected: our result is 
false-positive). 

Power (1-jS): "The power of a statistical test is the 
probability of correctly rejecting the null hypothesis 
when it is false" [Osmena, 2010]. jS is the probability of 
failing to obtain a significant difference between the 
investigated populations with the statistical test when 
there is a real difference in the background. It is also 
known as the Type II error rate, or the probability of a 
false-negative result when the null hypothesis is false, 
but it is failed to be rejected. In each statistical test, a 
lower Type II error is expected to acquire as well as a 
higher power (as ji approaches 0, 1-/3 approaches 1). 

Any four of these parameters determine the fifth. 
Hence, when we are interested in estimation of the 
value of power, we set the values of w, N, df and a for 
our study. 

Relationships between these parameters: 

When w is large the \-f> is also large in the event of a 
fixed N, a and By contrast, if there is a small w, l-f$ 
will be low and more samples are needed to detect the 
small w. 

As a increases, f> decreases, and power increases and 
vice versa. As a increases, N decreases, and as a 
decreases, N increases [Osmena, 2010]. 

To estimate the effect size for the chi-square test, an 
MS Excel-based calculator has been constructed for 
2x2, 2x3 and 2x4 contingency tables, applying Cohen's 
formula 



Pi; :"the proportion in cell i posited by the alternate 
hypothesis" (observed proportion of element i in the 
cross-classification table). 

Por. "the proportion in cell i posited by the null 
hypothesis" (expected proportion of element i in the 



cross-classification table). 

m: the number of cells in the contingency table [Cohen, 
1988; Osmena, 2010]. 

Our MS Excel calculator was compared with the R and 
GPower programs. For the 2x2 contingency tables 
used in two studies, the power estimation was also 
compared with SAS, and an R script was written 
(Appendix III). 

Without calculating the effect sizes, Cohen's 
suggestion of small, medium and large w (0.1, 0.3, 0.5, 
respectively) can also be used, depending on how 
large a difference expected to find in the study [Cohen, 
1988; Osmena, 2010]. 

The following sections present short descriptions of 
the three studies. 

Study 1: Alzheimer's Disease Data 

Alzheimer's disease (AD) is a neurodegenerative 
disorder, the most prevalent form of dementia [Selkoe, 
2001], in the development of which genetic factors 
play an important role. Identification of AD 
susceptibility genes would markedly promote an 
understanding of the pathophysiology of the disease 
[Bekris et al., 2010]. 

An important candidate gene for AD is 24- 
dehydrocholesterol reductase (DHCR24), the 
polymorphism of which (rs600491) involves a single 
nucleotide change resulting in 2 alleles (C and T) and 3 
genotypes (CC, CT and TT) [Lamsa et al., 2007]. 

Lamsa et al. investigated the DHCR24 rs600491 
polymorphism as regards the risk of AD in a Finnish 
sample, and found that the CC, CT and TT genotype 
frequencies did not differ significantly between the 
AD and healthy control (HC) female groups 

(NHCfemale=274, N A Dfemale=289; P(CC I HC)=0.11, 

P(CCIAD)=0.11; P(CTIHC)=0.43, P(CTI AD)=0.48; 
P(TTIHC)=0.46, P(TTIAD)=0.41; Fisher's Exact test 
p=0.44) [Lamsa et al., 2007]. A Hungarian examination 
likewise revealed no significant difference in DHCR24 
rs600491 genotype distribution between the AD and 
HC female populations (JVHCfemaie=139, NADfemaie=201; 
P(CCIHC)=0.137, P(CCIAD)=0.189; P(CT I HC)=0.525, 
P(CTIAD)=0.508; P(TTIHC)=0.338, P(TT I AD)=0.303; 
X 2 test p=0.426) [Feher et al., 2012]. 

The effect size and power in both studies of females 
have been calculated, from which the results are 
compared. 
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Study 2: Physiotherapeutic Data 

During an informatics project between September 2009 
and May 2011, a system was developed by Calculus 
Ltd. and Polygon Informatics Ltd. to facilitate 
physiotherapeutic examinations on the human body 
with different wireless biosensors and with the use of 
other parameters collected on patients. This 
Physiosensor system connected to an SAS server, 
where the data collected and saved during the 
examinations can be analyzed via an online user 
interface. The first set of investigations in the project is 
related to knee joint straightening. During the test, 3 
EMG sensors with an earthing strap (measuring the 
muscle activity), a goniometer (measuring the 
flexibility of a joint) and an event marker (determining 
time points) were used on specific muscles and part of 
the leg. The patient, in a sitting position, had to 
straighten their knee according to a specific protocol. 
Before the measurement, personal and other health 
data were collected. Our analysis of the w and 
estimation used only gender in comparison with a 
grouping variable based on the birth date of the 
patient. This comparison by a chi-square test resulted 
in a nonsignificant p-value, and therefore the effect 
size and statistical power could be investigated. 

Study 3: Anthropometric Data 

Anthropometric parameters of first-year university 
students (Faculty of Medicine, University of Szeged, 
Hungary; aged around 18-19), including the hip and 
waist circumferences, were measured three times in 
2010. Groups of averaged values of these parameters 
were compared with the count data in the chi-square 
test, and w and 1-/3 were then estimated. 

Programs Used 

GPower 3.1.5 (Heinrich-Heine-Universitat, Dusseldorf, 
Germany), RStudio 0.97.248 (RStudio Inc., Boston, MA, 
USA) as a user friendly interface for running R (R 
Foundation for Statistical Computing, University of 
Auckland, New Zealand) scripts, SAS 9.2 (SAS 
Institute Inc., Cary, NC, USA) and IBM SPSS Statistics 
20 (IBM Corp., Armonk, NY, USA) were used for 
statistical analyses. The MS Excel-based calculator was 
generated in Microsoft Office 2010 (Microsoft Hungary 
Ltd., Budapest, Hungary). 

Results 

Study 1: Alzheimer's Disease Data 

The Finnish dataset was based on a total of 563 female 
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patients (289 AD and 274 HC subjects; Table Al in 
Appendix I), while the Hungarian DHCR24 rs600491 
genotype frequency data contained overall 340 female 
patients (201 AD and 139 HC subjects; Table A2 in 
Appendix I). 

With the ES.w2() R function w, and then, with the 
pwr.chisq.test() function, the observed power for the 
above two datasets were calculated. The Finnish 
dataset resulted in a WFmni S h=0.054 with a 1-/3=0.1918 
among 563 cases at a 0.05 significance level. To achieve 
a power of 0.8 (80%), 3299 cases should be examined if 
the other parameters remain constant. 

The Hungarian dataset resulted in iUHungarian=0.071 with 
a 1-/3=0.198 among 340 cases at a 0.05 significance level. 
To achieve a power of 0.8, 1922 cases should be 
examined if the other parameters remain constant. 

Using w from the pilot (Finnish) dataset in the power 
analysis of the Hungarian dataset, a power of 0.132 
was obtained, which is slightly less than that 
calculated with w based on its own data. However all 
the data indicate a really low level of power. 

It is obvious that with a larger w, a smaller N is 
sufficient to obtain the same level of power, when a 
and df are fixed. Here, it was expected the least to 
emphasize this difference in calculation. As there is no 
available clinically relevant w, we can use that 
calculated from the preliminary Finnish dataset (this is 
what people mostly do), as if it were the "true" effect 
size, but it is not. As w from the Finnish dataset is 
accessible, there is no need to calculate the w of the 
Hungarian dataset. From this example, it can be seen 
that w can be different in similar studies, so the 
clinically relevant w value would be appropriate. Thus, 
it is highly important to handle post-hoc calculations 
with foresight. Our comparisons therefore were based 
on calculation. 

Figure 1 depicts the observed power (on the vertical 
axis) based on the Finnish data (empty dots) compared 
with the Hungarian data (filled dots) versus the 
number of cases (on the horizontal axis). The same 
Cohen formula for w and estimation for the chi- 
square tests [Cohen, 1988] as used in R is applied in 
the commonly used GPower software [Erdfelder et al., 
1996]. Our MS Excel-based calculator gave the same 
effect size of chi-square analysis as GPower for the 
Hungarian data (Figure 2), but it is difficult to use 
GPower for the estimation of w. Figure Al in Appendix 
I shows the R code with the result of the effect size and 
power analysis compared with Figure 2, using the 
same colors (magenta indicates effect size and orange 
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indicates the power in this paper). 

As SAS calculates power only for 2*2 contingency 



tables, a power analysis for the Alzheimer's disease 
study failed to be performed. 
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° Finnish data 
• Hungarian data 



— I 1 1 1 — 

1000 2000 3000 4000 

Number of women 



5000 



FIG.l POWER ESTIMATION (VERTICAL AXIS) FOR FINNISH AND HUNGARIAN FEMALE GENOTYPE FREQUENCY DATA WITH 
INCREASING TOTAL NUMBER OF FEMALES (HORIZONTAL AXIS). o=0.05, df=l, hjhungarian=0.071, jotinnish=0.054. A GREATER POWER 
CAN BE ACHIEVED WITH THE HUNGARIAN THAN WITH THE FINNISH DATA BASED ON THE SAME NUMBER OF PATIENTS. 

FIGURE WAS CREATED IN R. 
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FIG.2 COMPARISON OF OUR RESULTS WITH THE MS EXCEL CALCULATOR (LEFT-HAND PANEL) WITH THE GPOWER SOFTWARE 
RESULTS. IN THE EXCEL CALCULATOR, ONLY THE RED-FRAMED CONTINGENCY TABLE NEEDS TO BE COMPLETED; THE OTHER 
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Study 2: Physiotherapeutic Data 

The physiotherapeutic dataset was collected from the 
database of the Physiosensor system. As an increasing 
number of institutes now use the system for various 
medical studies (e.g. the University of Debrecen, 
Debrecen, Hungary, for neurological patients, or 
hospital in Csepel, Hungary), we had to select those 
cases performed in the Physiosensor laboratory (N=42). 
Among the 42 observations, 34 involved the knee joint 
straightening protocol between February 2 and April 
19 in 2011. To create a grouping variable, we 
differentiated the cases with respect to the median 
value of the birth date: the group of those born before 
and including 1986, and the group born in 1987 or 
later. 

As a demonstration for power analysis, we selected 
the data originating from before April 1, 2011 as a pilot 
dataset, and then ran a chi-square test on the 
remaining 29 cases. The contingency table for the 
investigated physiotherapeutic dataset can be found in 
Table A3 in Appendix I. 

The Pearson chi-square test resulted in a two-sided 
asymptotic p-value of 0.573 (the 2-sided Fisher's Exact 
test gave p=0.715), showing no significant difference 
between the gender and birth date groups based on 
this dataset at a 0.05 significance level. 

The data selection, variable transformation and chi- 
square tests were carried out in IBM SPSS Statistics 20. 

zu=0.105 was calculated with R, and with this value for 
N=29 patients and df=l (a 2x2 contingency table) at 
a=0.05, 1-/3=0.0872 was obtained. 

For the overall period February- April 2011, the 
contingency table is shown in Table A4 in Appendix I. 
The Pearson chi-square test resulted in a two-sided 
asymptotic p-value of 0.746 (the 2-sided Fisher's Exact 
test gave p=1.000). 

This extended study resulted in a zu=0.056 (almost half 
of that based on the preliminary data), which at a=0.05 
with df=l leads to a power of 0.0621 for 34 
observations. When the power for the extended 
dataset (N=34) was calculated based on the effect size 
of the preliminary dataset with 29 cases, a power of 
0.0937 was found. 

All power estimations based on physiotherapeutic 
datasets involving very low N resulted in almost no 
power for testing independence statistically from 
gender and birth date count data. 

Figure 3 illustrates the comparison of 1-/3 estimations of 
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preliminary and extended datasets with increasing N 
using the separately calculated w values for the pilot 
study and the extended study. It is clear that, in the 
event of a very low N for power and sample size 
estimations, even a small increase in N can cause a 
huge difference in power estimation. 

Comparison of preliminary and extended 
Physiosensor frequency datasets 
Power estimation with increasing number of cases 
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FIG. 3 COMPARISON OF POWER ESTIMATION (VERTICAL AXIS) 
OF PRELIMINARY (N=29) AND EXTENDED (N=34) 
PHYSIOTHERAPEUTIC DATASETS WITH INCREASING 
NUMBERS OF PATIENTS (HORIZONTAL AXIS), w IS 
CALCULATED SEPARATELY FOR THE TWO STUDIES. rt=0.05, 

df=l, U)PRELIMINARY=0.105, WEXTENDED=0.056. FIGURE WAS 

GENERATED IN R. 

Figure 4 presents a comparison of results obtained 
with the MS Excel-based calculator and GPower 
results. The effect size and power calculation results in 
R was compared. The script is presented in Figure A2 
in Appendix L 

The 1-/3 in SAS was also computed, using the w 
obtained with the preliminary data for estimation of 
the power for the extended dataset. The SAS syntax 
for this analysis can be found in Figure A3 in Appendix 
L Figure 5 illustrates the result of the SAS syntax based 
on Figure A3: at N=34 SAS calculated 1-^=0.092. The 
same calculation in R resulted in 1-/3=0.0937 (Figure A4 
in Appendix I). 



The SAS System 


Computed Power 




30 


1050 


0922 


The POWER ProcvdurQ 
Pearson Ctilsftuare Tesf for Two Proportions 


Insax 


N Total 


Power 




31 


tioo 


0935 


1 


2 


0091 




32 


1150 


0943 


Fixed Scenario Elements 


2 


a 


0050 


33 


1200 


0951 


□fetflbuUon 


As>Tnptolic roinuil 


3 


10 


0.061 


34 


1250 


0.958 


Metlud 


Hernial oppf oumatian 


4 


M 


O0M 




35 


1300 


0.9ft 


Null Proportion Difference 


D 


5 


18 


0071 




311 


1350 


0970 


Alpha 


Q.05 


« 


22 


0070 




37 


14Q0 


0974 


Group 1 Pro portion 


0.407 


7 


» 


0061 




38 


1450 


0.976 


Croup 2 Pro-portion 


osn 


8 


30 


0097 




3* 


1500 


0901 


Number of Sides 


2 


9 


34 


0097 




40 


1550 


09M 


Group 1 Weight 


1 


10 


50 


0113 




41 


teoo 


0.387 


GroupZWelghl 




11 


TOO 


0170 




42 


1650 


0.969 






13 


t» 


03« 


43 


1700 


0990 



FIG. 5 THE RESULT OF POWER ANALYSIS IN THE SAS SYSTEM 
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(MAGENTA-FRAMED). 



Study 3: Anthropometric Data 

In the anthropometric dataset from 2010 (N=362, 181 
men and 181 women) to demonstrate the w and 1-/3 
calculation for the chi-square test, mean waist and 
mean hip circumference variables (measured in cm 3 
times for each person) were divided into 2 categories 
at their median values (90 cm for the hips, and 79 cm 
for the waist). As the contingency table (Table A5 in 
Appendix I) has no cells containing an expected value 
less than 5, which meets the assumption of chi-square 
tests and can be interpreted. 

Appendix I also contains results of Pearson's chi-square 
tests, showing a nonsignificant difference between 
these two categorical variables at a 5% significance 
level (p=0.076) in Table A6. 

Figure 6 depicts a bar chart of the frequencies 
(contingency table) which can be a good graphical 
representation of such count data. It is seen that when 
the waist circumference is less than its median (79 cm), 
slight more samples also have a lower hip 
circumference, and when the waist circumference is 
greater than its median, more samples also have a 
greater hip circumference (compared with 



medianw P =90), a result that is probably to be expected. 



Bar Chart 




Gi catei TlKin or *.:|ucil to I Indian 




Category of Hip 

■ Greater than or equal to 
Median 

I Lo wer tl ion l.le.lio i 



L : . . -:i tlun I lo lion 



Category of Waist 



FIG. 6 BAR CHART OF FREQUENCY PARTITION AMONG 
WAIST AND HIP CIRCUMFERENCE CATEGORIES OF THE 
ANTHROPOMETRIC DATASET FROM 2010. COLUMNS 
REFLECT NUMBERS OF 104, 80, 84 AND 94 RESPECTIVELY (SEE 
TABLE A5 IN APPENDIX I). 

The initial dataset was in a CSV (Comma Separated 
Values) format, which is usually used as the input of 
statistical programs. Data were then imported into 
IBM SPSS Statistics 20 software, and variable 
computation, data preparation and first comparisons 
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were carried out on this software. Next to investigate 
the w and 1-/3 estimation for the chi-square test, our 
MS Excel-based calculator was run, and the results 
were compared with those of GPower, R software and 
(as we have a 2*2 contingency table) SAS. 

The MS Excel-based calculator and GPower gave 
if=0.093, which resulted in 1-/3=0.427 in GPower {Figure 
7). These results were also obtained with a 
representative dot plot (Figure 8) in R according to the 
script shown in Appendix III. 

Plots related to the calculated power of an exact w, df, 
N and a can also be generated with GPower software. 
A comparison of power for significance levels a=0.01 
and a=0.05 is presented with increasing total sample 
size for anthropometric data in Figure 9. This makes it 
clearer that, with stronger significance levels in the 
power analysis (lower a), more samples must be 
analyzed to obtain a stronger power (closer to 1), if the 
other parameters remain constant. 

Naturally Figures 8 and 9 for a=0.05 indicate the same 
result. It is expected that the results from the two 
software can be compared. 



Conclusions 

When a clinically relevant effect size as the "true" 
estimate of the effect in our study is unknown, effect 
size can be calculated from the preliminary dataset 
(this can also differ from the true effect size). Without 
previous results, w is generally calculated on the basis 
of the running study, as if that w were the "truth". 
Sofwares were employed to operate the same 
calculation, but the interpretation of these results 
should be carefully considered. 

In conclusion, as the usage of GPower to perform 
power analyses may be difficult for those without 
expertise in the field of statistics, an easy-to-use MS 
Excel calculator based on 2x2, 2x3 and 2x4 contingency 
tables in addition to other software is suggested. An R 
script for the 2x2 contingency table has been generated 
and such calculations have been applied to different 
practical studies. With such techniques, it is not 
difficult to calculate the effect size and power for chi- 
square tests and these calculations may feasibly be 
integrated into routine statistical analyses for 
researchers with biological or other backgrounds. 



Effect size calculation of parameters for 2*2 groups by Cohe 
Preliminary study 

Fill only red framed part of this sheet to get 
effect size of your frequency data i n green cell I 



Cross classification table 
Number in each cell | Grouping variable 2 

□f contingency table 



JJ 



KK 



Grouping 
variable 1 



6 



104 



94 



Probability of each cell with preliminary observed values 







Grouping 
variable 1 



Grouping variable 2 
JUL. 



0,2372928 I Q.22Q9945 



it 2?2l442 I : c- .Ws. 



Probability of no association between the 2 investigated factoi 







Grouping variable 2 



Grouping 
variable 1 



Effect size (w) 




0.2539724 I 2-1-13149 



0.2553646 I 2363481 
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Grouping variable 2 



JJ 



KK 



0.002129667 uuii. : .2 




Goodness - cf -f it tests: Contingency table: 



Type of power analysis 



Post hoc: Comput 



Output Parameters 





[ Equal p(H0) | [ Equal 



[ Calculate j 



H ES_2x2 ES_2x3 ES_2>=4 



Kesz | 



i WE] ED iocs. Q- 



FIG. 7 COMPARISON OF THE RESULTS OF EFFECT SIZE CALCULATION (MAGENTA) WITH THE MS EXCEL-BASED CALCULATOR 
(LEFT-HAND PANEL) AND GPOWER SOFTWARE (RIGHT-HAND PANEL) BASED ON THE CONTINGENCY TABLE (SET IN THE RED- 
FRAMED TABLE IN EXCEL, UPPER LEFT) OF THE ANTHROPOMETRIC DATA. PROBABILITIES CALCULATED FOR OBSERVED AND 
EXPECTED VALUES IN MS EXCEL CALCULATOR ARE FRAMED IN GREEN AND BLUE, RESPECTIVELY, AND ALSO SHOWN FOR 
GPOWER (RIGHT PANEL OF GPOWER). THE OBSERVED POWER ESTIMATION IN GPOWER SOFTWARE IS FRAMED IN ORANGE. 
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Power estimation with increasing number of cases 
for anthropometric frequency dataset 



CO 

o 



o 
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1000 



1500 
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Number of students 



FIG. 8 DOT PLOT COMPARING POWER ESTIMATION 
(VERTICAL AXIS) WITH INCREASING NUMBER OF CASES 
(HORIZONTAL AXIS) FOR ANTHROPOMETRIC DATA, 
DEMONSTRATING THAT TO ACHIEVE A POWER OF ABOUT 6 
ALMOST 1000 CASES SHOULD BE ANALYZED UNDER THE 
SAME CONDITION (dj=l, a=0.05, w=0.093). 



*~ 1 tmt | 



X" tens - Coodnets-ot-Mt tests: ■Contingency tables 
Of- i. effect size t - 




FIG. 9 COMPARISON OF POWER ESTIMATION (VERTICAL AXIS) 

FOR a=0.01 (RED LINE) AND a= 0.05 (BLUE LINE) WITH 
INCREASING TOTAL SAMPLE SIZE (HORIZONTAL AXIS) FOR 
ANTHROPOMETRIC DATA (df=l, zo=0.0933643). THIS PLOT WAS 
GENERATED BY GPOWER. 
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APPENDIX I 

Study 1: Alzheimer's Disease Data 

Table Al is a contingency table for Finnish Alzheimer's 
disease (AD) vs. healthy control (HC) cases (columns) 
by DHCR24 rs600491 genotypes (rows). In each cell, 
the 1 st row relates to the number of females, the 2 nd 
row to the column total percentages and the 3 rd row to 
the table total percentages. This table was created in R 
software. 

TABLE Al CONTINGENCY TABLE FOR FINNISH AD VS. HC 
CASES BY DHCR24 RS600491 GENOTYPES. 1STROW: NUMBER OF 
FEMALE CASES, 2ND ROW: COLUMN TOTAL PERCENTAGES, 
3RD ROW: TABLE TOTAL PERCENTAGES IN EACH CELL. 



Genotypes 


Alzheimer 

HC | 


AD 


Row Total 


cc 


30 ] 
0.109 | 
0.053 | 


32 
0.111 
0. 057 


62 


CT 


118 | 
0.431 | 
0.210 | 


139 
0.4S1 
O. 247 


257 


TT 


126 
0.460 | 
0.224 | 


11S 
0.408 
0. 210 


244 


col umn Total 


274 | 
0.487 | 


289 
0. 513 


563 



Table Al is the contingency table for Hungarian 
Alzheimer's disease vs. healthy control cases (columns) 
by DHCR24 rs600491 genotypes (rows). In each cell, 
the l sl row relates to the number of females, the 2 nd 
row to the column total percentages and the 3 rd row to 
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the table total percentages. This table was created in R 
software. 

TABLE A2 CONTINGENCY TABLE FOR HUNGARIAN AD VS. HC 
CASES BY DHCR24 RS600491 GENOTYPES. 1STROW: NUMBER OF 
FEMALE CASES, 2ND ROW: COLUMN TOTAL PERCENTAGES, 
3RD ROW: TABLE TOTAL PERCENTAGES IN EACH CELL. 



Genotypes 


Al zhei mer 

HC | 


AD 


ROW Total 


cc 


19 
0.137 | 
0.056 | 


38 
0. 189 
0.112 


57 


CT 


73 | 
0.525 | 
0.215 | 


102 
O. 507 
0. 300 


175 


TT 


47 
0.338 | 
0.138 


61 
0. 303 
0.179 


108 


column Total 


139 | 
0.409 | 


201 
0. 591 


340 



The calculation of w and 1-/3 for the chi-square test for 
the Hungarian Alzheimer dataset in R software is 
shown in Figure Al. The same contingency table is to 
be seen here in freq2 object as is shown in Table A2. In 
prob3 object, we calculated the probabilities for the 
overall 340 cases. R calculated the same w and \-f> as 
generated by GPower software (colored the same in 
the outputs of the two software: magenta indicates 
w=Q.071, and orange indicates 1-/3= 0.198; see Figure 2). 



FIG. Al EFFECT SIZE (MAGENTA-FRAMED) AND POWER 
(ORANGE-FRAMED) CALCULATION FOR HUNGARIAN 
ALZHEIMER'S DISEASE DATASET IN R SOFTWARE. 

Study 2: Physiotherapeutic Data 

Table A3 is the contingency table for the 
physiotherapeutic data comparing the birth date 
grouping with the gender. Overall 29 cases were 
investigated between February 2 and April 1, 2011. 
This table is generated in IBM SPSS Statistics software. 



> freqZ 

Al zhei mer 
Genotypes HC AD 
CC 19 38 
CT 73 102 
TT 47 61 

> prob3<-f req2/340 

> prob3 

Alzheimer 
Genotypes HC AD 

CC 0.05588235 0.1117647 
CT 0.21470588 0.3000000 
TT 0.13823529 0.1794118 

> ES.w2(prob3) 
[1] 0. 07080762 

> pwr . chi sq. test(w=ES. w2(prob3) , N=340, df=2, si g. 1 evel=0. 05 , power=) 

Chi squared power calculation 




NOTE : N is the number of observations 
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TABLE A3 CONTINGENCY TABLE FOR THE COMPARISON OF 
GENDER AND BIRTH DATE GROUPS FOR THE 
PHYSIOTHERAPEUTIC DATASET MEASURED BETWEEN 
FEBRUARY 2 AND APRIL 1 IN 2011. 



Biith date ' Gender Crosstabulation 





Gender 


Total 


Male 


Female 


Girth (late After 1986 Count 

Expected Count 
% within Gender 


7 
7,8 
46,7% 


8 
7,2 
57,1% 


15 
15,0 
51,7% 


Before 1 987 Count 

Expected Count 
% within Gender 


8 
T,2 
53,3% 


6 
6,8 
42,9% 


14 
14,0 

48,3% 


Total Count 

Expected Count 
% within Gender 


15 
1 5,0 
100,0% 


14 
14,0 
100,0% 


29 
29,0 
100,0% 



The contingency table for the physiotherapeutic data, 
comparing the birth date grouping with gender, is 
shown in Table A4. Overall, 34 cases were investigated 
between February 2 and April 19, 2011. This table was 
generated in IBM SPSS Statistics software. 

TABLE A4 CONTINGENCY TABLE FOR THE COMPARISON OF 
GENDER AND BIRTH DATE GROUPS FOR THE EXTENDED 
PHYSIOTHERAPEUTIC DATASET MEASURED BETWEEN 
FEBRUARY 2 AND APRIL 19 IN 2011. 



Birth date * Gender Crosstahulation 





Gender 


Total 


Male 


Female 


Birth date After 1 986 Count 

Expected Count 
% within Gender 


8 
8,5 
50,0% 


10 
9,5 
55,6% 


18 
18,0 
52,9% 


Before 1987 Count 

Expected Count 
% within Gender 


8 
7,5 
50,0% 


8 
8,5 
44,4% 


16 
16,0 
47,1% 


Total Count 

Expected Count 
% within Gender 


16 
16,0 
1 00,0% 


18 
18,0 
100,0% 


34 
34,0 
100,0% 



Calculations of w and 1-/3 for the chi-square test for the 
preliminary Physiosensor frequency dataset in R 
software are shown in Figure A2. The structure is 
similar to that in Figure Al. We used freq_pl object in 
this short R script for the contingency table, which is 
to be seen in Table A3, and prob_pl object to calculate 
the probabilities for the overall 29 cases. R calculated 
the same w and 1-/3 as generated by GPower software 
(colored the same in the outputs of the two softwares: 
magenta indicates if =0.1 05, and orange indicates 1-/3 
=0.087; Figure 4). 



> freq_pl 

Gender 

Birthdate Male Female 
After 1986 7 8 

Before 1987 8 6 

> prob_pl 

Gender 

Birthdate Hale Female 

After 1986 0.2413793 0.2758621 
Before 1987 0.275B621 0.206S966 

> E5. w2(prob_pl) 
[1] 0.1047619 

> pwr . chi sq. test(w=E5. w2(prob_pl) , N=N_pl, df=l, sig. level=Q. 05, pov;er=]) 

Chi squared power calculation 

w = fo^J4T619l 

rff = 1 
si g. level ^QJi^—mmm- 
power = Cf. 087185133 

NOTE : N is the number of observations 



FIG. A2 EFFECT SIZE (MAGENTA-FRAMED) AND POWER 
(ORANGE-FRAMED) CALCULATION FOR PRELIMINARY 
PHYSIOTHERAPEUTIC DATASET (N=29) IN R SOFTWARE. 

The SAS code for power analysis with different 
sample sizes based on the group proportions of the 
2x2 contingency table of the preliminary Physiosensor 
dataset for the extended dataset (N=34) is to be seen in 
Figure A3. 



/* Power analysis for 2 by 2 contingency table */ 

/* contingency table for physiosensor preliminary dataset: 

7 8 

8 6 



*/ 

proc power; 
twosair.plef req test=pchi 
alpha=0.05 

groupproportiona = (0.467 0.571) 

nul lpr opo r t i ondif E=D 

ntotal=2 to 34 by 4 50 to 1700 by 50 

power=. 



/» Pearson Chi-square test */ 

/* significance level */ 

/* proportions for columns: 7/(7+8) 8/(8+6) */ 

/* testing the nnllhypothesis : there is no difference in proportions */ 

/* checking power estimation for different sample sizes */ 

/*' power estimation: 1-beta */ 



plot x=n; 
run; 



/* horizontal axis represents the number of cases, vertical axis shows the power •/ 



FIG. A3 SAS CODE FOR POWER ANALYSIS FOR A 2><2 CONTINGENCY TABLE BASED ON THE PRELIMINARY PHYSIOTHERAPEUTIC 
DATASET. THE POWER IS CALCULATED ON THE BASIS OF THE PROPORTIONS IN THE PRELIMINARY DATASET, BUT RUN FOR 

NUMBER OF CASES (N=34) FROM THE EXTENDED DATASET. 
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The power calculation in R, using effect size based on 
the preliminary physiotherapeutic data for the 
extended (N=34) dataset, is presented in Figure A4. The 
analysis results in 1-^=0.0937. 

> pwr. chisq.test(w=ES.w2(prob_pl). N=N_p2 , df=l. sig. level=Q. 05, power=) 

Chi squared power calculation 

w = 0.1047619 

N = 34 

df = 1 

sig. level = 0.05 

power = 0. Q937Z4S7 

MOTE: N is the number of observations 

FIG. A4 POWER ANALYSIS BASED ON hj=0.105 FROM THE 
PRELIMINARY DATA FOR THE EXTENDED (N=34) DATASET. 
1-^=0.0937, CALCULATION WAS PERFORMED IN R. 

Study 3: Anthropometric Data 

The contingency table of the anthropometric dataset 
from 2010, comparing waist and hip circumference 
groups, is shown in Table A5 (total number of cases 
362). Each cell contains the number of observed cases, 
the expected value and the row percentage of each 
pair of comparisons. This contingency table was 
created with IBM SPSS Statistics 20. 

TABLE A5 CONTINGENCY TABLE FOR THE 
ANTHROPOMETRIC DATASET COMPARING THE WAIST AND 
HIP CIRCUMFERENCE CATEGORIES (JV=362). 



CategoiyofWaist ' Category of Hip CrosstabUlaHon 





Category of Hip 


Total 


Greaterthan 
or equal to 
Median 


Lowerthan 
Median 


Category ofWaist Greater than or equal to Count 

Mecli3n Expected Count 

% within Category of 
Waist 


104 
95,6 
56,5% 


00 
BB,* 
43,5% 


184 
1B4.0 
100,0% 


Lowerthan Median Count 

Expected Gdunt 
% within Category of 
Waist 


84 
92,4 
47,2% 


94 
B5.S 
52, B% 


178 
178,0 
100,0% 


Total Count 

Expected Count 
% within Category of 
Waist 


1 BB 
188,0 
51,9% 


174 
174,0 
48,1% 


352 
362,0 
100.0% 



Table A6 results of the chi-square tests on the 
anthropometric dataset are from 2010. Pearson's chi- 
square test shows an asymptotic p-value of 0.076. As 
none of the cells in the contingency table contains an 
expected value less than 5, this chi-square test can be 
interpreted. This table was made with IBM SPSS 
Statistics 20. 

TABLE A6 RESULTS OF CHI-SQUARE TESTS IN SPSS BASED ON 
THE CONTINGENCY TABLE SHOWN IN TABLE A5. 



Chi-Square Tests 





Value 


df 


Asymp. Sig. 
(2-sided) 


Exact Sig. (2- 
sided) 


Exact Sig. (1- 

sided) 


Pearson Chi-Square 


3,1 56 a 


1 


,676 






Continuity Correction* 


2,793 


1 


,695 






Likelihood Ratio 


3,160 


1 


,675 






Fisher's ExactTest 








,692 


,647 


N ofValid Cases 


362 











a. cells (0,0%) have expected count less than 5. The minimum expected count is 85,56 

b. Computed only for a 2x2 table 



APPENDIX II 

An MS Excel calculator for Effect Size for chi-square 
power analysis for 2x2, 2x3 and 2x4 contingency tables 
is available online (http://www3.szote.u- 
szeged.hu/dmi/Anna_Laszlo/Effect- 
size_calculator_for_Chi-square_power.xlsx). 

APPENDIX III 

As a third appendix, an R script with explanatory 
comments for effect size and power estimation of the 
anthropometric data is available in Figure A5. The code 
has 5 main parts (indicated by ### at the beginning of 
the comment): 

1. Creating the contingency table (freq_al 2x2 
matrix) by waist and hip circumference groups. 
Basically, the user only has to set frequencies 
for this contingency table, using rlcl_al, 
rlc2_al, r2cl_al, r2c2_al as the number of each 
cell (counts), where r and c are denoted as row 
and column, and the numbers indicate, which 
row and column we are in. For example, r2cl is 
the cell at the intersection of the second row 
and the first column in the contingency table. 
The notation "_al" denotes that these count 
numbers are from a preliminary study. 

2. Running the chi-square test and comparing the 
results with those from IBM SPSS Statistics 
(Appendix 7; Tables of Study 3) to check the 
calculations of this software. Contingency table 
is compared by using the CrossTable() function 
from the gmodels package. The chi-square test 
assumption is checked by calculating the 
expected values from the contingency table. 
Fisher's Exact test and the chi-square test with 
the Yates continuity correction are compared 
between the two softwares. They give the same 
result. 

3. For the effect size calculation for the chi-square 
test, probabilities have to be calculated 
(prob_al 2x2 matrix) and the pwr package has 
to be imported to allow use of the ES.w2() 
function for effect size estimation. 

4. Power is estimated with the pwr.chisq.test() 
function from the pwr package. To run this 
estimation, effect size (from the previous 
calculation by the ES.w2() function), degrees of 
freedom for the contingency table ((number of 
rows - l)*(number of columns - 1)), number of 
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cases (N_al is the sum of the elements in the 
contingency table) and significance level (a set 
as 0.05) have to be set. If power is set instead of 
number of cases, then it can be estimated that 
how many cases would present a determined 
value of power (e.g. 0.8, 0.9 and 0.95), 
supposing the same effect size, significance 



level and degrees of freedom. 

5. A dot plot was drawn representing the power 
with increasing number of cases for the same 
study parameters (u>=0.09336431, a=0.05 df=l as 
we have a 2*2 contingency table). This is 
shown in Figure 8. 
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33 
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55 
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#2013-01-09 

#Effect size and power estimation for chi -square test 
#Anthropometric data 

###2?2 contingency table for pilot study: dataset 2010 (al: anthropometric dataset 1) 
rlcl_al=94 ^element of first row and first column 
rlc2_al=S4 ^element of first row and second column 
r2cl_al=SQ ^element of second row and first column 
r2c2_al=104 #element of second row and second column 



f req_al< 



f req_al 
# 

#wai stci 

# < Medi 

# Medi an 



■matr i x :'c frlcl_al , rlc2_al , 
r2cl_al , r2c2_al , 
nr ow=2 , byr ow=TRUE , 

dimnames = 1 i st fwai stci rcumf erence = cf"< Median" 
Hi pci rcumf erence = c:"'< Median", 

) 



"Medi an <=") , 
'Median <=")) 



rcumf erence 
an 



Hipcircumference 
< Median Median <= 
94 84 
80 104 



###checking results in SPSS : 

i nstal 1 . packages C'gmodel s") #install packages if needed 

library(gmodels) #after installing gmodels package 

crossTabl e ;freq_al , prop. c=FALSE , prop, chi sq=FALSE> #row %s are ok 

f 1 sher . test Cf req_al> #Fisher's Exact Test for count Data p=0. 09219, the same as in SPSS 

chisq.test freq_al :expected #the same as in SPSS 

chi sq. test (freq_al) #x-squared = 2.7928, df = 1, p-value = 0.09469 

#the same as in SPSS's continuity correction chi-square test 



###Effect size calculation 
#number of cases 

N_al=rlcl_al-t-rlc2_al+r2cl_al-t-r2c2_al *362 - nuirber of cases (overall frequency) 

#probability of observed values in the contingency table (added up to 1; each is nij/N in cell 

prob_al <- freq_al N_al 

prob_al 

# Hipcircumference 

< Median Median <= 
0.2596685 0.2320442 
0.2209945 0.2872928 



ij): 



#wai stci rcumf erence 

# < Median 

# Median <= 
libraryCpwr^;: 
ES. w2 f prob_al ; 



#w=0. 09336431 



###Power and samp! e si ze esti mati on 
#p owe r e st i mat i o n 
pwr . chi sq. test fw=ES. w2 fprob_al ; 
#saraipl e size estimation 
pwr . chi sq. test fw=ES. w2 fprob_al ; 
pwr . chi sq. test fw=ES. w2 fprob_al ; 
pwr . chi sq. test fw=ES. w2 :'prob_al 
pwr . chi sq. test :'w=ES. w2 fprob_al ; 
pwr . chi sq. test fw=ES. w2 fprob_al : 



df=(2-l)*(2-l) , N=N_al, sig. level =. 05, power = :: 



df =(2-1)* (2-1), 
df=(2-l)*(2-l), 
df=(2-l)»(2-l), 
df=(2-l)*(2-l), 
df=(2-l)*(2-l) , 



si g. 1 eve! = 
si g. 1 evel = 
si g. 1 evel = 
si g. 1 evel = 
si g. 1 evel = 



05. 
05. 
05, 
05. 
05. 



power=0. 7) $N 
power=0. 8) $N 
power=0. 9) $N 
power=0. 95) $N 
power=0. 99)$N 



#power=0. 4272621 

#709 

#901 

#1206 

=■1-31 

#2108 



###Plot of power compared with| i ncreasi ng number of cases 
N_all<-as. numeri c() #number of cases 
P_al<-as. numeri c() #power 
for (1 in 1:19) { 

N_all[i]=pwr. chisq.test :>=ES.w2Cpr ob_al) , N=, df=l, sig. level=D. 05, power = (0. 01+i e 0. 05))$N 
P_al[i]=0. 01+i*0. 05 

} 

pi ot f N_all , P_al , 



Power estimation with increasing number of cases\nfor anthropometric frequency dataset" 
Number of students", 
Power estimation", 



man n 
xl ab 
yl ab 
type 
col ="dark red" 
xlim=c(0,2200) , 
ylim=c(0,l) 



FIG. A5 R SCRIPT FOR EFFECT SIZE AND POWER CALCULATION FOR THE ANTHROPOMETRIC DATASET 
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